Skip to content

Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps)#4153

Merged
ehildenb merged 8 commits into
masterfrom
equation-local-fixpoint
Jun 11, 2026
Merged

Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps)#4153
ehildenb merged 8 commits into
masterfrom
equation-local-fixpoint

Conversation

@ehildenb

@ehildenb ehildenb commented Jun 10, 2026

Copy link
Copy Markdown
Member

This PR generalizes booster's equation evaluation strategy. Today, when an equation rewrites a subterm during the bottom-up simplification traversal, re-normalization happens by restarting whole-term passes from the top, so a rewrite chain advances one step per pass. This PR adds a per-pass budget, --equation-max-local-steps N: after a rewrite, the result is re-simplified in place (LLVM evaluation plus re-entering the bottom-up traversal) for up to N steps per pass before deferring back to the global restart loop. N=0 — the default — is exactly the existing restart-only behavior as the degenerate case of the same code path, so this changes nothing out of the box; the knob lets users tune how deep in-place simplification chains may run.

Changes:

  • New server option --equation-max-local-steps (default 0, identical to current behavior); the budget resets every global pass.
  • Booster.Pattern.ApplyEquations: on a rewrite in the bottom-up pass, localFixpointEval re-enters the cached traversal in place, counting re-entries against the budget.
  • Termination stays fully bounded: at most --equation-max-iterations passes times (budget + 1) applications per chain, with two-layer loop detection (per-step chain check inside the budget, whole-term snapshots across passes, which catch cycles longer than the budget).
  • Unit tests pin the default-0 boundary (identical to the old iteration-limit behavior) and the budgeted behaviors (deeper chains, per-step loop detection, combined bound) in an explicit options window.

Validation:

  • At the default, the unit suite and rpc-integration tests are unchanged, including the test-log-simplify-json log-trace golden — the default is trace-identical to master.
  • KEVM dose-response sweep (15 biggest-moving specs, N ∈ {0,1,2,3,4,5,20,100,1000}, all specs passing in every configuration): budgets 1–3 give about −30% total wall-clock (1530 s at N=0 → 1042 s at N=2), with per-spec wins of −30…−50% saturating at N=1–2.
  • The sweep also shows why this is a knob and not yet a new default: at N≥4 one spec family (ecrecoverloop02) regresses +40–50%, because an in-place chain completes a second nested memory write before the restart-pass concat rules flatten the first, and booster hands the resulting shape to Kore (Kore-side equation applications jump 2 980 → 26 477). The regression is an indicator of missing simplification lemmas / preserves-definedness annotations, not of the strategy.
  • Re-running the sweep with an extended lemma set that fills those gaps removes the cliff entirely: improvement becomes monotone in the budget, with the best total at N=20 (−35% vs N=0). Once the lemma set keeps simplification in-house, re-simplifying at the same subterm until done is a net performance win. This is the extended lemmas set: Simplification lemmas to improve Haskell booster coverage evm-semantics#2859. On that PR, without this change KEVM has some proofs that have major regressions. Coupled with this PR, and setting N=20, we get only performance wins there.

ehildenb and others added 5 commits June 10, 2026 17:39
…attern/ApplyEquations: local-fixpoint evaluation behind --equation-local-fixpoint

iterateEquations restarts the full term traversal whenever any node changes, and within a pass a successful application's RHS is not re-evaluated at its position. Each firing therefore triggers a whole-term sweep in which every ancestor of the changed node is a novel term and parents match against unevaluated RHS redexes — a sweep multiplier on equation attempts that grows with the length of causal rewrite chains.

Behind the new --equation-local-fixpoint flag (default off, for A/B measurement), the BottomUp equation pass wraps its evaluation hook: when a node is rewritten, run the LLVM pass on the result (preserving the LLVM-before-equations ordering of the global loop) and re-enter the cached recursion on it, normalizing the new subtree in place. Ancestors then see children in final form and the global loop converges in 1-2 passes instead of one per causal chain step. Whole-term snapshots no longer catch node-level oscillations, so the chain of local rewrites along the current path is tracked in a path-scoped localSteps stack (saved/restored around each local recursion, reset for nested predicate evaluation) and checked per step, throwing EquationLoop; maxIterations keeps counting global passes only, so TooManyIterations semantics and its partial result are unchanged. traverseTerm itself is untouched (it is shared with the LLVM and path-condition passes).

Validation: with the flag on, a causal chain of depth 101 that exceeds the global-pass limit today normalizes in a single pass, the loopDef oscillation is still detected as EquationLoop, and standard scenarios give identical results. The flag-window test is sequenced after the iteration-limit test it would otherwise race with (the test binary runs tests in parallel). Full unit suite passes; test-retain-condition-cache passes with byte-identical output with --equation-local-fixpoint enabled.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…attern/ApplyEquations: replace the local-fixpoint flag with a per-pass budget, on by default (--equation-max-local-steps)

The Bool flag forked evaluation into two strategies and left the in-place recursion unbounded (a single pass could apply arbitrarily many equations, with loop detection as the only brake). Replace it with a budget: at most maxLocalSteps in-place rewrites per traversal pass (--equation-max-local-steps, default 20, chosen on the expectation that a few applications normalize a side condition to its final truth value). On exhaustion, rewritten nodes are returned without recursion — exactly the restart-only strategy — and the global loop picks them up on the next pass. Restart-only evaluation is therefore the budget-0 special case of a single code path rather than a separate mode, the new strategy is the default, and total work is again bounded by maxIterations passes times (budget plus one application per node), with TooManyIterations and its partial result reached as before.

Loop detection becomes two-layered: the path-scoped per-step check catches oscillations shorter than the budget immediately, and cycles that survive a pass boundary recur in the existing whole-term snapshots within at most one cycle period of passes (a period dividing the budget repeats the snapshot on the very next pass).

Validation: bottom-up chains beyond the old one-application-per-pass limit complete with defaults; a lowered pass limit with the default budget still aborts an over-long chain with a partial result; budget 0 reproduces the old depth-100/101 success/abort boundary exactly; oscillations are detected per local step with the budget on and by whole-term snapshots with it off. Full unit suite passes; test-retain-condition-cache produces byte-identical output with the default budget.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… binding, ambiguous record updates)

The --pedantic CI build (-Werror) rejected the leftover
isTooManyIterations helper, unused since the iteration-limit test moved
into the budget-0 window, and the EquationOptions record updates, which
are ambiguous under DuplicateRecordFields because the field names are
shared with EquationConfig. Construct the options explicitly instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…log golden for in-place evaluation

In-place evaluation attempts the INT.lt hook directly at the rewritten
constraint subterm (logged failure on the symbolic argument) and the
later simplify pass no longer re-attempts it (cache hit). Log-trace
changes only; all RPC responses are unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…efault --equation-max-local-steps to 0 (restart-only)

Make the in-place budget opt-in so the default behavior is identical to
the existing restart-only evaluation strategy. The unit tests now pin
the legacy iteration-limit boundary at the default and exercise the
in-place behaviors (deeper chains, per-step loop detection, combined
bound) in an explicit budget-20 window. The test-log-simplify-json
golden reverts to the upstream trace, guarding that the default changes
nothing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ehildenb ehildenb changed the title Local-fixpoint equation evaluation Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps) Jun 10, 2026
@ehildenb ehildenb marked this pull request as ready for review June 10, 2026 23:41
@ehildenb ehildenb requested a review from jberthold June 10, 2026 23:41
…tart loop and the --equation-max-local-steps budget

The booster description's equation section now explains the
restart-from-top pass structure that was previously implicit (and
described as hard-coded), and how the in-place budget generalises it.
Also fixes a typo (traverser) in the strategy bullets.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@jberthold jberthold left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results look quite good 🎉
but I would probably change a few details here, not least to make it easier to understand the code and the algorithm it implements.

Comment on lines +204 to +205
, localSteps = []
, localStepCount = 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the invariant that length localSteps == localStepCount ?
If that is true then we might not need a separate counter.. but it might be good to have it for large maxLocalSteps instead of computing length many times.
I did not find a place where the localSteps are reset in the iteration, only the counter is reset.

EDIT: I found where the stack is popped, and the localStepCount is different from what I thought. I'd make it per level to get the mentioned invariant and make it easier to see what is going on.

@ehildenb ehildenb Jun 11, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3def8ed — the counter is now restored together with the stack around the recursion, so localStepCount == length localSteps is a state invariant; the counter only exists to avoid computing length per step.

Comment on lines +409 to +410
-- each pass gets a fresh in-place rewriting budget
eqState . modify $ \s -> s{localStepCount = 0}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I was a bit confused here because
a) only the counter is reset here, not the localSteps stack
b) should the counter not be reset at every level of the tree?

EDIT: After commenting on the localFixPointEval I understand that the counter may be counting evaluations of the entire tree during one iteration. I'd make it per level, though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3def8ed — the per-pass reset is gone; the counter is path-scoped alongside localSteps (per level, as you suggested), so the budget bounds each in-place chain's depth.

{ localSteps = t' : s.localSteps
, localStepCount = s.localStepCount + 1
}
result <- llvmSimplify t' >>= recurse

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here to repeat the evaluation of t at this level of the AST ? Or all levels from the bottom up up to this level again?
The code is currently doing the latter because above, simp is passed as the recurse parameter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All levels, and that is deliberate: the rewrite builds a new subterm from the rule's RHS whose arguments need full evaluation before equations at this level (or above) can match decisively. The descent is cut short by the equation cache / isEvaluated at the substituted subject parts, so after the first step of a chain the re-entry costs one cache lookup per argument. Documented in the comment on localFixpointEval in 3def8ed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples I looked at would not benefit from additional simplification at just the top-level, they require stepping into the arguments of the functions that are produced on the RHS, evaluating those, and then evaluating the top-level again.

Comment on lines +422 to +423
let onEval
| direction == BottomUp = localFixpointEval simp

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented in localFixpointEval, passing simp here (with BottomUp direction) means that the entire subtree from a node downwards is evaluated repeatedly. Maybe this is cut short because of an isEvaluated flag (see cached definition) so it is quick, but still a traversal.
The intention was to repeat evaluation of the current node in the AST, not re-evaluate all nodes below, so this should probably just use onEval itself for the recursion.

Suggested change
let onEval
| direction == BottomUp = localFixpointEval simp
let onEval
| direction == BottomUp = localFixpointEval onEval

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered this but kept the full re-entry: re-applying at the node only would leave the RHS's new argument redexes unevaluated this pass, making matches at the node (and at ancestors) indeterminate — for function equations that is an abort, not a skip. The cache already prevents re-traversal of unchanged parts. Rationale added to the localFixpointEval comment in 3def8ed.

Comment on lines +476 to +480
eqState . modify $ \s ->
s
{ localSteps = t' : s.localSteps
, localStepCount = s.localStepCount + 1
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since st has been captured, the code does not have to use modify here. Also, the counter could be pre-evaluated to avoid a thunk (but it will be forced in each iteration anyway).

Suggested change
eqState . modify $ \s ->
s
{ localSteps = t' : s.localSteps
, localStepCount = s.localStepCount + 1
}
let !newCount = st.localStepCount + 1
eqState $ put
st
{ localSteps = t' : st.localSteps
, localStepCount = newCount
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3def8ed

, localStepCount = s.localStepCount + 1
}
result <- llvmSimplify t' >>= recurse
eqState . modify $ \s -> s{localSteps = st.localSteps}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line effectively pops the localSteps stack but does not reset the counter.
A subsequent call to evaluate an AST node upward of this one will inherit the counter. I think the counter could be reset to st.localStepCount after recurse as well, then it would be st.localStepCount == length st.localSteps as a state invariant.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3def8ed — the counter is reset to st.localStepCount after recurse, establishing the invariant.

ehildenb and others added 2 commits June 11, 2026 12:58
…cope the in-place budget (localStepCount == length localSteps)

Per review: restore the counter together with the chain around the
recursion, making the budget bound each in-place chain's depth rather
than the total per pass, with localStepCount == length localSteps as a
state invariant (the counter exists only to avoid computing the length
per step). The per-pass reset in the global loop becomes redundant and
is removed. Also use put on the captured state with a pre-forced
counter instead of modify, and align the option help and docs with the
per-chain budget semantics.

The in-place recursion deliberately re-enters the full bottom-up
traversal of the rewritten result (rather than re-applying at the
rewritten node only): the rewrite builds a new subterm from the rule's
RHS whose arguments need full evaluation, and the equation cache cuts
the descent short at already-normal substituted subject parts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ehildenb ehildenb merged commit 5679e50 into master Jun 11, 2026
6 checks passed
@ehildenb ehildenb deleted the equation-local-fixpoint branch June 11, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants